Whop for stmpltaneoos iDiaMfr 



X0178 



POtt STMPLTAITEOOS IDEtiWflCATION 
OF DI??ERm TALLY EXPRE SSED mRNAs AND MEASUREMENT 
9f KET,ATTVE COWCENTRATTOWS 



• by 

Dr. Mark G. Er lander & Dr. J. Gregor Sutcliffe 



GOVERNMENT RIGHTS 



The research underlying this invention has been 
funded by the National Institutes of Health, Grant No. 
NS22347/GM32355 . The government may have certain rights 
in this invention. 

f BACKGROUND OF THE INVENTION 

This invention is directed to methods for 
simultaneous identification of differentially expressed 
mRNAs, as well as measurements of their relative 
concentrations . 




An ultimate goal of biochemical research ought 
to be a complete characterization of the protein 
molecules that make up an organism. This would include 
their identification, sequence determination, 
demonstration of their anatomical sites of expression, 
elucidation of their biochemical activities, and 
understanding of how these activities determine 
organismic physiology. For medical applications, the 
description should also include information about how the 
concentration of each protein changes in response to 
pharmaceutical or toxic agents. 

Let us consider the scope of the problem: How 
many genes cure there? The issue of how many genes are 
expressed in a mammal is still unsettled after at least 




two decades of study. There are few direct studies that 
address patterns of gene expression in different tissues. 
Mutational load studies (J.o. Bishop, "The Gene Numbers 
Game," Cell 2:81-86 (1974); T. Ohta & M. Kimura, 
"Functional Organization of Genetic Material as a Product 
of Molecular Evolution," Nature 223:118-119 (1971)) have 
suggested that there are between 3x10* and 10 5 essential 
genes . 

Before cDNA cloning techniques, information on 
gene expression came from RNA complexity studies: analog 
measurements (measurements in bulk) based on observations 
of mixed populations of RNA molecules with different 
specificities in abundances. To an unexpected extent, 
early analog complexity studies were distorted by hidden 
complications of the fact that the molecules in each 
tissue that make up most of its mRNA mass comprise only a 
small fraction of its total complexity. Later, cDNA 
cloning allowed digital measurements (i.e., sequence- 
specific measurements on individual species) to be made; 
hence, more recent concepts about mRNA expression are 
based upon actual observations of individual RNA species. 

Brain, liver, and kidney are the mammalian 
tissues that have been most extensively studied by analog 
RNA complexity measurements. The lowest estimates of 
complexity are those of Hastie and Bishop (N.D. Hastie & 
J. B. Bishop, "The Expression of Three Abundance Classes 
of Messenger RNA in Mouse Tissues," Cell 9:761-774 
(1976)), who suggested that 26xl0 6 nucleotides of the 
3X1 0» base pair rodent genome were expressed in brain, 
23x10 s in liver, and 22x10* in kidney, with nearly, 
complete overlap in RNA sets. This indicates a very 
minimal number of tissue-specific mRNAs. However, 
experience has shown that these values must clearly be 
underestimates, because many mRNA molecules, which" were 



probably of abundances below the detection limits of this 
arly study, have been shown to be expressed in brain but 
detectable in neither liver nor kidney. Many other 
researchers (J\A. Bantle & W.E. Hahn, "Complexity and 
Characterization of Polyadenylated SNA in the Mouse 
Brain," Cell 8:139-150 (1976); D.M- Chikaraishi, 
"Complexity of cytoplasmic Polyadenylated and Non- 
Adenylated Rat Brain Ribonucleic Acids," Biochemistry 
18:3249-3256 (1979)) have measured analog complexities of 
between 10 0-20 0x1 0 6 nucleotides in brain, and 2-to-3-fold 
lower estimates in liver and kidney. Of the brain mRNAs, 
50-65% are detected in neither liver nor kidney. These 
values have been supported by digital cloning studies 
(R.J. Milner & J.G. Sutcliffe, "Gene Expression in Rat 
Brain," Nucl. Acids Res. 11:5497-5520 (1983)). 

Analog measurements on bulk mRNA suggested that 
the average mRNA length was between 1400-1900 
nucleotides. In a systematic digital analysis of brain 
mRNA length using 200 randomly selected brain cDNAs to 
measure RNA size by northern blotting (Milner & 
Sutcliffe, supra ) , it was found that, when the mRNA size 
data were weighted for RNA prevalence, the average length 
was 1790 nucleotides, the same as that determined by 
analog measurements- However, the mRNAs that made up 
most of the brain mRNA complexity had an average length 
of 5000 nucleotides. Not only were the rarer brain RNAs 
longer, but they tended to be brain specific, while the 
more prevalent brain mRNAs were more ubiquitously 
expressed and were much shorter on average. 

These concepts about mRNA lengths have been 
corroborated more recently from the length of brain mRNA 
whose sequences have been determined (J.G. Sutcliffe, 
"mRNA in the Mammalian Central Nervous System, " Annu. 
Rev. Neurosci. 11:157-198 (1988)). Thus, the 1-2x10* 



nucleotide complexity and 5000-nucleotide average mRNA 
length calculates to an estimated 30,000 mRNAs expressed 
in the brain, of which about 2/3 are not detected in 
liver or kidney. Brain apparently accounts for a 
considerable portion of the tissue-specific genes of 
mammals . Most brain mRNAs are expressed at low 
concentration. There are no total-mammal mRNA complexity 
measurements, nor is it yet known whether 5000 
nucleotides is a good mRNA-length estimate for non-neural 
tissues. A reasonable estimate of total gene number 
might be between 50,000 and 100,000. 

What is most needed to advance by a chemical 
understanding of physiological function is a menu of 
protein sequences encoded by the genome plus the cell 
types in which each is expressed. At present, protein 
sequences can be reliably deduced only from cDNAs , not 
from genes, because of the presence of the intervening 
sequences (introns) in the genomic sequences. Even the 
complete nucleotide sequence of a mammalian genome will 
not substitute for characterization of its expressed 
sequences. Therefore, a systematic strategy for 
collecting transcribed sequences and demonstrating their 
sites of expression is needed. Such a strategy would be 
of particular use in determining sequences expressed 
differentially within the brain. It is necessarily an 
eventual goal of such a study to achieve closure; that 
is, to identify all mRNAs. Closure can be difficult to 
obtain due to the differing prevalence of various mRNAs 
and the large number of distinct mRNAs expressed by many 
distinct tissues. The effort to obtain it allows one to 
obtain a progressively more reliable description of the 
dimensions of gene space. 

Studies carried out in the laboratory of Craig 
Venter (M.D. Adams et al., "Complementary DNA Sequencing: 
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Expressed Science Tags and Human Genome Project, n 
Science 252:1651-1656 (1991); M.D. Adams et al., 
* Sequence Identification of 2,375 Human Brain Genes 
Nature 355:632-634 (1992)) have resulted in the isolation 
of randomly chosen cONA clones of human brain mRNAs, the 
determination of short single-pass sequences of their 3'- 
ends, about 300 base pairs, and a compilation of some 
2500 of these as a database of "expressed sequence tags." 
This database, while useful, fails to provide any 
knowledge of differential expression. It is therefore 
important to be able to recognize genes based on their 
overall pattern of expression within regions of brain and 
other tissues and in response to various paradigms, such 
as various physiological or pathological states or the 
effects of drug treatment, rather than simply their 
expression in a single tissue. 

Other work has focused on the use of the 
polymerase chain reaction (PCR) to establish a database, 
Williams et al. (J.G.K. Williams et al., "DNA 
Polymorphisms Amplified by Arbitrary Primers Are Useful 
as Genetic Markers," Nucl. Acids Res, 18:6531-6535 
(1990)) and Welsh & McClelland (J. Welsh $ McClelland, 
"Genomic Fingerprinting Using Arbitrarily Primed PGR and 
a Matrix of Pairwise Combinations of Primers," Nucl 
Acids Res. 18:7213-7218 (1990)) showed that single 10-mer 
primers of arbitrarily chosen sequences, i.e., any 10-mer 
primer off the shelf, when used for PCR with complex DNA 
templates such as human, plant, yeast, or bacterial 
genomic DNA, gave rise to an array of PGR products. The 
priming events were demonstrated to involve incomplete 
complementarity between the primer and the template DNA. 
Presumably, partially mismatched primer-binding sites are 
randomly distributed through the genome. Occasionally, 
two of these sites in opposing orientation were located 
closely enough together to give rise to a PCR product 



band. There were on .vera,. ,- 10 produets , whlch 

»1IiL ! " b ~ t 0-4 *° ^ 4 Md •»* «««-* 

mobiliti s for each primer. The ai-r»„ ~» „„ 

^ w# array of pcr pr ducts 

axhzMted diff erences among individual, of the same 
species. These authors proposed that the single 
arbitrary primers could be used to produce restriction 
fragment length polymorphism (RFLP)-liKe information for 
genetic studies, others have applied this technology 

fciLTTT** " "** nA ° U Sa9U,nCe "Nucleotide 

Primers Detect Polymorphic DNA Products Which Segregate 

in inbred strains of Mice," Mam,,, r. 3:73 . 78 

X.H .adeau et al. , "Multilocus Makers for Mouse Genome 
Analysis: pcr Amplification Based on Single Primers of 
Arbitrary Nucleotide Sequence," 3 = 55-64 
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„ Two ' J - We ^h et al., "Arbitrarily 

Primed pcr Fingerprinting of rha, " Hucl . p - 

20:4965-4970 (1992); p. tiang , A . B ~^— 

Differential Display of Eukaryotic Messenger RNA by 

"1 (1992)) adapted the method to compare mRNA 
populations, m the study of Liang and Pardee, this 

com* :! " «=Play, was used to 

compare the population of mPJJAs expressed by two related 

each IT n0n ° al Md tUnori ^i= »°use A31 cells. Po r 

L= t . *" 0li ' onucla «"e complementary to a 
pTampilfr^.' tailS " 9 *' " P-i-r Performing 

ITc-laT 0n , SaqUenCin ' ^ «* "-100 bands ranging from 

3 "n d s f! MPUf iMti0n ° f ^ ^-Ponding to tie 
3 ends of mRKAs that contain the complement of the 3' 
anchor primer and a partially mismatched 5' primer site. 
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as had been observed on genomic DNA templates. For e ach 
primer pair, the pattern of bands amplified from the two 
cDNAs was similar, with the intensities of about 80% of 
the bands being indistinguishable* Some of the bands 
were more intense in one or the other of the PCR samples; 
a few were detected in only one of the two samples. 

Further studies (P. Liang et al., "Distribution 
and Cloning of Eukaryotic mRNAs by Means of Differential 
Display: Refinements and Optimization," Nucl. Acids Res. 
21:1269-3275 (1993)) have demonstrated that the procedure 
works with low concentrations of input RNA (although it 
is not quantitative for rarer species) , and the 
specificity resides primarily in the last nucleotide of 
the 3' anchor primer. At least a third of identified 
differentially detected PCR products correspond to 
differentially expressed RNAs, with a false positive rate 
of at least 25%. 




If all of the 50,000 to 100,000 mRNAs of the 
mammal were accessible to this arbitrary -primer PCR 
approach, then about 80-95 5' arbitrary primers and 12 3' 
anchor primers would be required in about 1000 PCR panels 
and gels to give a likelihood, calculated by the Poisson 
distribution, that about two-thirds of these mRNAs would 
be identified. 



It is unlikely that all mRNAs are amerfable to 
detection by this method for the f ollow^ngreasons. For 
an mRNA to surface in such a surv^it must be prevalent 
enough to produce a signal orj^tfie aut ©radiograph and 

ain a sequence in Ltp^Z' 500 nucleotides capable of 
ing as a site f^r^mismatched primer binding and 
priming. The mpre prevalent an individual mRNA species, 
the more lively it would be to generate a product. Thus, 
prevalept^species may give bands with many different 
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arbitrary primers. Because this latter property voulg^» 
contain an unpredictable element of chance based^on 
selection of the arbitrary primers, it woul^^toe difficult 
to approach closure by the arbitrary primer method. 

information to be portable from one 

s mismatched 
ier different 
PGR machines , with 
iction conditions. As 
poorly understood, 
a database from data 
differential display 



There is therefore a need for an improved 
method of differential display of mRNA species that 
reduces the uncertain aspect of 5 '-end generation and 
allows data to be absolutely reproducible in different 
settings. Preferably, such a method does not depend on 
potentially irreproducible mismatched priming. 
Preferably, such a method reduces the number of PCR 
panels and gels required for a complete survey and allows 
double-strand sequence data to be rapidly accumulated. 
Preferably, such an improved method also reduces, if not 
eliminates, the number of concurrent signals obtained 
from the same species of mRNA. 




SUMMARY 

We have developed an improved method for the 
simultaneous sequence-specif ic identification of mRNAs in 
a mRNA population. In general, this method comprises: 

(1) preparing double-stranded cDNAs from a mRNA 
population using a mixture of 12 anchor primers, the 
apchor primers each including: (i) a tract of from 7 to 
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40 T r sidues; (ii) a site for cleavage by a restriction 

ndonuclease that r cognizes more than six bases, the 
site for cleavage being located to the 5 '-side of the 
tract of T residues; (iii) a stuff er segment of from 4 to 
40 nucleotides, the stuff er segment being located to the 
5 '-side of the site for cleavage by the restriction 
endonuclease; and (iv) phasing residues -V-K located at 
the 3' end of each of the anchor primers, wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for 
V and N; 

(2) producing cloned inserts from a suitable 
host cell that has been transformed by a vector, the 
vector having the cDNA sample that has been cleaved with 
a first restriction endonuclease and a second restriction 
endonuclease inserted therein, the cleaved cDNA sample 
being inserted in the vector in an orientation that is 
antisense with respect to a bacteriophage-specif ic 
promoter within the vector, the first restriction 
endonuclease recognizing a f our-nucleotide sequence and 
the second restriction endonuclease cleaving at a single 
site within each member of the mixture of anchor primers; 

(3) generating linearized fragments of the 
cloned inserts by digestion with at least one restriction 
endonuclease that is different from the first and second 
restriction endonucleases; 

(4) generating a cRNA preparation of antisense 
cRNA transcripts by incubation of the linearized 
fragments with a bacteriophage-specif ic RNA polymerase 
capable of initiating transcription from the 
bacteriophage-specif ic promoter; 

(5) dividing the cHNA preparation into sixteen 
subpools and transcribing first-strand cDNA from each 
subpool, using a thermostable reverse transcriptase and 
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one of sixte n primers whose 3 '-terminus is -K-N, wherein 
N is on of the four deoxyribonucleotides A, C, G, or T, 
the primer b ing at 1 ast 15 nucleotides in length, 
corresponding in sequence to the 3 '-end of the 
bacteriophage-specif ic promoter, and extending across 
into at least the first two nucleotides of the cRNA, the 
mixture including all possibilities for the 3 '-terminal 
two nucleotides; 

(6) using the product of transcription in each 
of the sixteen subpools as a template for a polymerase 
chain reaction with a 3 '-primer that corresponds in 
sequence to a sequence in the vector adjoining the site 
of insertion of the cDNA sample in the vector and a 5'- 
primer selected from the group consisting of: (i) the 
primer from which first-strand cDNA was made for that 
subpool; (ii) the primer from which the first-strand cDNA 
was made for that subpool extended at its 3 '-terminus by 
an additional residue -N, where N can be any of A, C, G, 
or T; and (iii) the primer used for the synthesis of 
first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can 
be any of A, C, G, or T, to produce polymerase chain 
reaction amplified fragments; and 

(7) resolving the polymerase chain reaction 
amplified fragments by electrophoresis to display bands 
representing the 3 '-ends of mRNAs present in the sample. 

Typically, the anchor primers each have 18 T 
residues in the tract of T residues, and the stuff er 
segment of the anchor primers is 14 residues in length. 
A suitable sequence for the stuffer segment is A-A-C-T-G- 
G-A-A-G-A-A-T-T-C (SEQ ID NO: 1). 

Typically, the site for cleavage by a 
restriction endonuclease that recognizes more than six 
bases is the Not I cleavage site. In this case, suitable 
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anchor primers have the sequence A-A-C-T-G-G-A-A-G-A-A-T- 
T-CHS-C-G-^-C-C^G-C-AHS-G-A^A-T-T-T-T^T-T-T-T^r-T^T^r-^P— 
T-T-T-T-T-V-N (SEQ ID NO: 2) * 

Typically , the bacteriophage-specif ic promoter 
is selected from the group consisting of T3 promoter and 
T7 promoter. Most typically, it is the T3 promoter. 

Typically, the sixteen primers for priming of 
transcription of cDNA from cRNA have the sequence A-G-G- 
T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3) . 

The vector can be the plasmid pBC SK* cleaved 
with Clal and Not I, in which case the 3 '-primer in step 
(6) can be G-A-A-C-A-A-A-A-G-C-T-G-G-A-G-C-T-C-C-A-C-C-G- 
C (SEQ ID NO: 4) . 

The first restriction endonuclease recognizing 
a f our-nucleotide sequence is typically Msp l ; 
alternatively, it can be Taq I or HinPlI. The restriction 
endonuclease cleaving at a single site in each of the 
mixture of anchor primers is typically Not I, 

Typically, the mRNA population has been 
enriched for polyadenylated mRNA species. 

A typical host cell is a strain of Escherichia 

coii. 

The step of generating linearized fragments of 
the cloned inserts typically comprises: 

(a) dividing the plasmid containing the 
insert into two fractions, a first fraction cleaved with 
the restriction endonuclease Xho l and a second fraction 
cleaved with the restriction endonuclease Sai l; 
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(b) recombining the first and second 
fractions after cleavage; 

(c) dividing the recombined fractions into 
thirds and cleaving the first third with the restriction 
endonuclease Hind lll , the second third with the 
restriction endonuclease Bam HI, and the third third with 
the restriction endonuclease EcoR l ; and 

(d) recombining the thirds after digestion 
in order to produce a population of linearized fragments 
of which about one-sixth of the population corresponds to 
the product of cleavage by each of the possible 
combinations of enzymes. 

Typically, the step of ^pasiolvlng the polymerase 
--cfikin reaction amplified fragments by electrophoresis 
comprises electrophoresj^of the fragments on at least 
/rwo gels. 

The method can further comprise determining the 
sequence of the 3 '-end of at least one of the mRNAs, such 
as by: 

(1) eluting at least one cDNA corresponding to 
a mRNA from an electropherogram in which bands 
representing the 3 '-ends of mRNAs present in the sample 
are displayed; 

(2) amplifying the eluted cDNA in a polymerase 
chain reaction; 

(3) cloning the amplified cDNA into a plasmid; 

(4) producing DNA corresponding to the cloned 
DNA from the plasmid; and 

(5) sequencing the cloned cDNA. 

Another aspect of the invention is a method of 
simultaneous sequence-specific identification of mRNAs 
corresponding to members of an antisense cRNA pool 
representing the 3 '-ends of a population of 'mRNAs , the 
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antisansa cRNAs that are members of the antisense cRNA 
pool being terminated at their 5 '-end with a primer 
sequ nee corresponding to a bacteriophage-specif ic vector 
and at their 3 '-end with a sequence corresponding in 
sequence to a sequence of the vector* The method 
comprises; 

(1) dividing the members of the antisense cRNA 
pool into sixteen subpools and transcribing first-strand 
cDNA from each subpool, using a thermostable reverse 
transcriptase and one of sixteen primers whose 3'- 
terminus is -N-N, wherein N is one of the four 
deoxyribonucleotides A, C, G, or T, the primer being at 
least 15 nucleotides in length, corresponding in sequence 
to the 3 '-end of the bacteriophage-specif ic promoter, and 
extending across into at least the first two nucleotides 
of the cRNA, the mixture including all possibilities for 
the 3 '-terminal two nucleotides; 

(2) using the product of transcription in each 
of the sixteen subpools as a template for a polymerase 
chain reaction with a 3 '-primer that corresponds in 
sequence to a sequence vector adjoining the site of 
insertion of the cDNA sample in the vector and a 5'- 

N 

primer selected from the group consisting of: (i) the 
primer from which f irst-strand cDNA was made for that 
subpool; (ii) the primer from which the first-strand cDNA 
was made for that subpool extended at its 3 '-terminus by 
an additional residue -N, where N can be any of A, C, G, 
or T; and (iii) the primer used for the synthesis of 
first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can 
be any of A, C, G, or T, to produce polymerase chain 
reaction amplified fragments; and 

(3) resolving the polymerase chain reaction 
amplified fragments by electrophoresis to display bands 
representing the 3 '-ends of mRNAs present in the sample. 

13 



Yet: another aspect of the present invention is 
a method for detecting a change in the pattern of mRNA 
expression in a tissue associated with a physiological or 
pathological change. This method comprises the steps of: 

(1) obtaining a first sample of a tissue that 
is not subject to the physiological or pathological 
change; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing steps 
(l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool representing the 3'- 
ends of a population of mRNAs to generate a first display 
of bands representing the 3 '-ends of mRNAs present in the 
first sample; 

(3) obtaining a second sample of the tissue 
that has been subject to the physiological or 
pathological change; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing steps 
(l)-(3) of the method described above for simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool to generate a second 
display of bands representing the 3'-ends of mRNAs 
present in the second sample; and 

(5) comparing the first and second displays to 
determine the effect of the physiological or pathological 
change on the pattern of mRNA expression in the tissue* 

The comparison is typically made in adjacent 

lanes * 

The tissue can be derived from the central 
nervous system or from particular structures within the 
central nervous system. The tissue can alternatively be 
derived from another organ or organ system. 

14 
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Another aspect of the present invention is a 
method of screening for a side effect of a drug. The 
method can comprise the steps of: 

(1) obtaining a first sample of tissue from an 
5 organism treated with a compound of known physiological 

function; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing steps 
(l)-(3) of the method described above for simultaneous 

10 sequence-specific identification of mRNAs corresponding 

to members of an antisense cRNA pool to generate a first 
display of bands representing the 3 '-ends of mRNAs 
present in the first sample ; 

(3) obtaining a second sample of tissue from 
15 $ an organism treated with a drug to be screened for a side 

effect; 

(4) determining the pattern of mRNA expression 
CP in the second sample of the tissue by performing steps 

r (l)-(3) of the method described above for simultaneous 

s 

20 Q sequence-specific identification of mRNAs corresponding 

jfc to members of an antisense cRNA pool to generate a second 

\]\ display of bands representing the 3 '-ends of mRNAs 



01 



y present in the second sample; and 

(5) comparing the first and second displays in 
*!5 order to detect the presence of mRNA species whose 

expression is not affected by the known compound but is 
affected by the drug to be screened, thereby indicating a 
difference in action of the drug to be screened and the 
known compound and thus a side effect. 

30 

The drug to be screened can be a drug affecting 
the central nervous system, such as an antidepressant, a 
neuroleptic, a tranquilizer, an anticonvulsant, a 
monoamine oxidase inhibitor, or a stimulant. 
35 Alternatively, the drug can be another class of drug such 

as an anti-parkinsonism agent, a skeletal muscle 
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relaxant, an analgesic, a local anesthetic, a 
cholinergic, an antispasmodic, a steroid, or a non- 
steroidal anti- inflammatory drug. 

Another aspect of the present invention is 
panels of primers and degenerate mixtures of primers 
suitable for the practice of the present invention* 
These include: 

(1) a panel of primers comprising 16 primers of 
the sequence A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID 
NO: 3), wherein N is one of the four deoxyribonucleotides 
A, C, G, or T; 

(2) a panel of primers comprising 64 primers of 
the sequences A-G-G-T-C-G-A-C-GH3-T-A-T-C-G-G-N-N-N (SEQ 
ID NO; 5) , wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; 

(3) a panel of primers comprising 256 primers 
of the sequences A-G-G-T-C-GW^-C-G-G-T-A-T-C-G-G-N-N-N-N 
(SEQ ID NO: 6), wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; and 

(4) a panel of primers comprising 12 primers 
of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C- 
G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T~T-T-T-V-N 
(SEQ ID NO: 2) , wherein V is a deoxyribonucleotide 
selected from the group consisting of A, C, and G; and N 
is a deoxyribonucleotide selected from the group 
consisting of A, C, G, and T; and 

(5) a degenerate mixture of primers comprising 
a mixture of 12 primers of the sequences A-A-C-T-G-G-A-A- 
G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T- 
T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, each of the 12 

primers being present in about an equimolar quantity. 

i 
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These and other features, aspects, and 
advantages of the present invention will become better 
5 understood with reference to the following description, 

appended claims, and accompanying drawings where: 

Figure 1 is a diagrammatic depiction of the 
method of the present invention showing the various 
stages of priming, cleavage, cloning and amplification; 
10 and 

Figure 2 is an autoradiogram of a gel showing 
the result of performing the method of the present 
invention using several 5 '-primers in the PGR step 



corresponding to known sequences of brain mRNAs and using 
15 Ci liver and brain mRNA as starting material. 

01 
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DESCRIPTION 



20 y We have developed a method for simultaneous 

sequence-specific identification and display of mRNAs xn 
yi a mRNA population. 

|=tS= 

As discussed below, this method has a number of 
*5 applications in drug screening, the study of 

physiological and pathological conditions, and genomic 
mapping. These applications will be discussed below- 

I. SIMULTANEOUS SEQUENCE-SPECIFIC IDENTIFICATION OF 
30 mRNAs 



A method according to the present invention, 
based on the polymerase chain reaction (PCR) technique, 
provides means for visualization of nearly every mRNA 
expressed by a tissue as a distinct band on a gel whose 
intensity corresponds roughly to the concentration of the 
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mRNA. The method is based on the observation that 
virtually all mRNAs conclude with a 3 '-poly (A) tail but 
does not rely on the specificity of primer binding to the 
tail. 

In general, the method comprises: 

(1) preparing double-stranded cDNAs from a mRNA 
population using a mixture of 12 anchor primers, the 
anchor primers each including: (i) a tract of from 7 to 
40 T residues; (ii) a site for cleavage by a restriction 
endonuclease that recognizes more than six bases, the 
site for cleavage being located to the 5'-side of the 
tract of T residues; (iii) a stuff er segment of from 4 to 
40 nucleotides, the stuff er segment being located to the 
5'-side of the site for cleavage by the restriction 
endonuclease; and (iv) phasing residues -V-N located at 
the 3' end of each of the anchor primers, wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, the mixture 
including anchor primers containing all possibilities for 
V and N; 

(2) producing cloned inserts from a suitable 
host cell that has been transformed by a vector, the 
vector having the cDNA sample that has been cleaved with 
a first restriction endonuclease and a second restriction 
endonuclease inserted therein, the cleaved cDNA sample 
being inserted in the vector in an orientation that is 
antisense with respect to a bacteriophage-specif ic 
promoter within the vector, the first restriction 
endonuclease recognizing a f our-nucleotide sequence and 
the second restriction endonuclease cleaving at a single 
site within each member of the mixture of anchor primers; 

(3) generating linearized fragments of the 
cloned inserts by digestion with, at least one restriction 
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endonuclease that is different from the first and sec nd 
restriction endonucleases ; 

(4) generating a cRNA preparation of antisense 
cRNA transcripts by incubation of the linearized 
fragments with a bacteriophage-specif ic RNA polymerase 
capable of initiating transcription from the 
bacteriophage-specif ic promoter; 

(5) dividing the cRNA preparation into sixteen 
subpools and transcribing first-strand cDKA from each 
subpool, using a thermostable reverse transcriptase and 
one of sixteen primers whose 3 '-terminus is -N-N, wherein 
N is one of the four deoxyribonucleotides A, C, G, or T, 
the primer being at least 15 nucleotides in length, 
corresponding in sequence to the 3 '-end of the 
bacteriophage-specif ic promoter, and extending across 
into at least the first two nucleotides of the cRNA, the 
mixture including all possibilities for the 3 '-terminal 
two nucleotides; 

(6) using the product of transcription in each 
of the sixteen subpools as a template for a polymerase 
chain reaction with a 3 '-primer that corresponds in 
sequence to a sequence in the vector adjoining the site 
of insertion of the cDNA sample in the vector and a 5'- 
primer selected from the group consisting of: (i) the 
primer from which first-strand cDNA was made for that 
subpool; (ii) the primer from which the first-strand cDNA 
was made for that subpool extended at its 3 ' -terminus by 
an additional residue -N, where N can be any of A, C, G, 
or T; and (iii) the primer used for the synthesis of 
first-strand cDNA for that subpool extended at its 3'- 
terminus by two additional residues -N-N, wherein N can 
be any of A, C, G, or T, to produce polymerase chain 
reaction amplified fragments; and 

(7) resolving the polymerase chain reaction 
amplified fragments by electrophoresis to display bands 
representing the 3 '-ends of mRNAs present in the sample. 
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A depiction t this scheme is shown in Figure 



a. ig<?latj<?n <?f mpyA 

The first step in the method is isolation or 
provision of a mRNA population. Methods of extraction of 
RNA are well-known in the art and are described, for 
example, in J. Sambrook et al., "Molecular Cloning: A 
Laboratory Manual" (Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, New York, 1989), vol. 1, ch. 7, 
"Extraction, Purification, and Analysis of Messenger RNA 
from Eukaryotic Cells," incorporated herein by this 
reference. Other isolation and extraction methods are 
also well-known. Typically, isolation is performed in 
the presence of chaotropic agents such as guanidinium 
chloride or guanidinium thiocyanate, although other 
detergents and extraction agents can alternatively be 
used. 

Typically, the mRNA is isolated from the total 
extracted RNA by chromatography over oligo (dT) -cellulose 
or other chromatographic media that have the capacity to 
bind the polyadenylated 3'-portion of mRNA molecules. 
Alternatively, but less preferably, total RNA can be 
used. However, it is generally preferred to isolate 
poly (A) * RNA. 

B. Preparation of Double-Stranded cDNA 

Double-stranded cDNAs are then prepared from 
the mRNA population using a mixture of twelve anchor 
primers to initiate reverse transcription. The anchor 
primers each include: (i) a tract of from 7 to 40 T 
residues; (ii) a site for cleavage by a restriction 
endonuclease that recognizes more than six bases, the 
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site for cleavage being located to the S'-side of the 
tract of T residues; (iii) a stuff er segment of from 4 to 
40 nucleotides, the stuff er segment being located to the 
5'-side of the site for cleavage by the restriction 
endonuclease; and (iv) phasing residues -V-N located at 
the 3' end of each of the anchor primers, wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T. The mixture 
includes anchor primers containing all possibilities for 
V and N. 

Typically, the anchor primers each have 18 T 
residues in the tract of T residues, and the stuff er 
segment of the anchor primers is 14 residues in length. 
A suitable sequence of the stuffer segment is A-A-C-T-G- 
G-A-A-G-A-A-T-T-C (SEQ ID NO: 1). Typically, the site 
for cleavage by a restriction endonuclease that 
recognizes more than six bases is the NotI cleavage site. 
A preferred set of anchor primers has the sequence A-A-C- 
T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T- 
T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2) . 

One member of this mixture of twelve anchor 
primers initiates synthesis at a fixed position at the 
3'-end of all copies of each mRNA species in the sample, 
thereby defining a 3 '-end point for each species. 

This reaction is carried out under conditions 
for the preparation of double-stranded cDNA from mRNA 
that are well-known in the art. Such techniques are 
described, for example, in Volume 2 of J. Sambrook et 
al., "Molecular Cloning: A Laboratory Manual", entitled 
"Construction and Analysis of cDNA Libraries." 
Typically, reverse transcriptase from avian / 
myeloblastosis virus is used. J 
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C. Cleavage of the cDNA Sample With R striction 
gpdOTWgleatge? 



The cDMA sample is cleaved with two restriction 
endonucleases. The first restriction endonuclease is an 
endonuclease that recognizes a 4 -nucleotide sequence. 
This typically cleaves at multiple sites in most cDNAs. 
The second restriction endonuclease cleaves at a single 
site within each member of the mixture of anchor primers. 
Typically, the first restriction endonuclease is Msp l and 
the second restriction endonuclease is Not l. The enzyme 
Not does not cleave within most cDNAs. This is desirable 
to minimize the loss of cloned inserts that would result 
from cleavage of the cDNAs at locations other than in the 
anchor site. 

Alternatively, the first restriction 
endonuclease can be Tacr l or Hin PlI. The use of the 
latter two restriction endonucleases can detect rare 
mRNAs that are not cleaved by Msp I. The first 
restriction endonuclease generates a S'-overhang 
compatible for cloning into the desired vector, as 
discussed below. This cloning, for the pBC SK* vector, 
is into the Cla l site, as discussed below. 

Conditions for digestion of the cDNA are well- 
known in the art and are described, for example, in J. 
Sambrook et al., "Molecular Cloning: A Laboratory 
Manual," Vol. 1, Ch. 5, "Enzymes Used in Molecular 
Cloning. " 

D. Insertion of Cleaved cDNA into a Vector 

The cDNA sample cleaved with the first and 
second restriction endonucleases is then inserted into a 
vector*. A suitable vector is the plasmid pBC SK+ that 
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has been cleaved vith the restriction endonucleases Clar 
and ftot l. The vector contains a bacteriophage-specific 
promoter. Typically , the promoter is a T3 promoter or a 
T7 promoter. A preferred promoter is bacteriophage T3 
promoter. The cleaved cDNA is inserted into the promoter 
in an orientation that is antisense with respect to the 
bacteriophage-specif ic promoter. 

E. Transformation of a Suitable Host Cell 

The vector into which the cleaved DNA has been 
inserted is then used to transform a suitable host cell 
that can be efficiently transformed or transfected by the 



J vector containing the insert. Suitable host cells for 

15 « cloning are described, for example, in Sambrook et al. , 

01 "Molecular Cloning: A Laboratory Manual," supra . 

fJ Typically, the host cell is prokaryotic. A particularly 

QTi suitable host cell is a strain of E. coli . A suitable £&. 

s r* coli strain is MC1061. Preferably, a small aliquot is 

20 * also used to transform E. coli strain XLl-Blue so that 

Ltd 

43 the percentage of clones with inserts is determined from 

j^P the relative percentages of blue and white colonies on X- 

q gal plates. Only libraries with in excess of 5xl0 5 

H" recombinants are typically acceptable. 

5 

F. Generation of Linearized Fragments 

Plasmid preparations, typically as minipreps, 
are then made from each of the cDNA libraries. 

30 Linearized fragments are then generated by digestion with 

at least one restriction endonuclease that is different 
from the first and second restriction endonucleases 
discussed above. Preferably, an aliquot of each of the 
cloned inserts is divided into two pools, one of which is 

35 cleaved with Xhol and the second with Sai l. The, pools of 

linearized plasmids are combined, mixed, then divided 



23 



into thirds- The thirds are digested with Hindlll, 
Bam HI, and EcoR I» This procedure is followed because , in 
order to generate antisense transcripts of the inserts 
with T3 RNA polymerase, the template must first be 
cleaved with a restriction endonuclease that cuts within 
flanking sequences but not within the inserts themselves. 
Given that the average length of the 3 '-terminal Mspl 
fragments is. 256 base pairs, approximately 6% of the 
inserts contain sites for any enzyme with a hexamer 
recognition sequence. Those inserts would be lost to 
further analysis were only a single enzyme utilized. 
Hence, it is preferable to divide the reaction so that 
only one of either of two enzymes is used for 
linearization of each half reaction. Only inserts 
containing sites for both enzymes (approximately 0-4%) 
are lost from both halves of the samples. Similarly, 
each cRNA sample is contaminated to a different extent 
with transcripts from insertless plasmids, which could 
lead to variability in the efficiency of the later 
polymerase chain reactions for different samples because 
of differential competition for primers. Cleavage of 
thirds of the samples with one of three enzymes that have 
single targets in pBC SK* between its Clal and Not I sites 
eliminates the production of transcripts containing 
binding sites for the eventual 5' primers in the PCR 
process from insertless plasmids. The use of three 
enzymes on thirds of the reaction reduces the use of 
insert-containing sequences that also contain sites for 
the enzyme while solving the problem of possible 
contamination of insertless sequences. If only one 
enzyme were used, about 10% of the insert-containing 
sequences would be lost, but this is reduced to about 
0.1%, because only those sequences that fail to be 
cleaved by all three enzymes are lost. 

i 

i 
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The next step is a generation of a cRNA 
preparation of antisense cRNA transcripts. This is 
performed by incubation of the linearized fragments with 
an RNA polymerase capable of initiating transcription 
from the bacteriophage-specif ic promoter- Typically, as 
discussed above, the promoter is a T3 promoter, and the 
polymerase is therefore T3 RNA polymerase- The 
polymerase is incubated with the linearized fragments and 
the four ribonucleoside triphosphates under conditions 
suitable for synthesis. 

H. Transcription of First-Strand cDNA 

The cRNA preparation is then divided into 
sixteen subpools. First-strand cDNA is then transcribed 
from each subpool, using a thermostable reverse 
transcriptase and a primer as described below. A 
preferred transcriptase is the recombinant reverse 
transcriptase from Thermus thermophilus , known as rTttl/ 
available from Perkin-Elmer (Norwalk, CT) . This enzyme 
is also known as an RNA-dependent DNA polymerase. With 
this reverse transcriptase, annealing is performed at 
6Q°C / and the transcription reaction at 70 °c. This 
promotes high fidelity complementarity between the primer 
and the cRNA. The primer used is one of the sixteen 
primers whose 3 '-terminus is -N-N, wherein is one of 
the four deoxyribonucleo tides A, C, G, or T, the primer 
being at least 15 nucleotides in length, corresponding in 
sequence to the 3' -end of the bacteriophage-specif ic 
promoter, and extending across into at least the first 
two nucleotides of the cRNA. 
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Where the bacteriophage-specif ic promoter is 
the T3 promoter, the primers typically have the sequence 
A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3) . 

i. pep lection 

The next step is the use of the product of 
transcription in each of the sixteen subpools as a 
template for a polymerase chain reaction with primers as 
described below to produce polymerase chain reaction 
amplified fragments. 

The primers used are: (a) a 3 '-primer that 
corresponds in sequence to a sequence in the vector 
adjoining the site of insertion of the cDNA sample in the 
vector; and (b) a 5 '-primer selected from the group 
consisting of: (i) the primer from which first-strand 
cDNA was made for that subpool; (ii) the primer from 
which the first-strand cDNA was made for that subpool 
extended at its 3 '-terminus by an additional residue -N, 
where N can be any of A, C, G, or T; and (iii) the primer 
used for the synthesis of first-strand cDNA for that 
subpool extended at its 3 '-terminus by two additional 
residues -N-N, wherein N can be any of A, C, G, or T, 

When the vector is the plasmid pBC SK + cleaved 
with Clal and Not I, a suitable 3 '-primer is G-A-A-C-A-A- 
A-A-G-C-T-G-G-A-G-C-T-C-C-A-C-C-G-C (SEQ ID NO: 4) . 
Where the bacteriophage-specif ic promoter is the T3 
promoter, suitable 5 '-primers have the sequences A-G-G-T- 
C-G-A-C-G-G-T-A-T-C-G-G-N-N (SEQ ID NO: 3), A-G-G-T-C-G- 
A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ ID NO: 5) , or A-G-G-T-C-G- 
A-C-G-G-T-A-T-C-G-G-N-N-N-N (SEQ ID NO: 6) . 

Typically, PCR is performed in the presence of 
3S S-dATP using a PCR program of 15 seconds at 94 °C for 
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denaturation, 15 seconds at SO-C for annealing-, and 30 
seconds at 72 -c for synthesis on a Perkin-Elmer 9600 
apparatus (PerJcin-Elaer Cetus, Norwalk, CT) . The high 
temperature annealing step minimizes artif actual 
mispriming by the 5'-primer at its 3 '-end and promotes 
high fidelity copying. 

Alternatively, the PCR amplification can be 
carried out in the presence of a n P-labeled 
deoxyribonucleoside triphosphate, such as [»P]dCTP 
However, it is generally preferred to use a "s-labeled 
deoxyribonucleoside triphosphate for maximum resolution 
Other detection methods, including nonradioactive labels 
can also be used. ' 

These series of reactions produces is, 64, and 
256 product pools for the three sets of 5'-primers. it 
produces is proauct pools for ^ ^ ^ ^ ^ 

as the primer from which first-strand cDNA was made. It 
produces 64 product pools for the primer extended at its 

-termxnus by an additional residue N, where N can be 
any of the four nucleotides, it produces 256 products 
for the primer extended at its 3'-terminus by two 
addxtxonal residues -N-N, where N again can be any of the 
four nucleotides. 

The process of the present invention can be 
extended by using longer sets of 5'-primers extended at 
the lr 3'-end by additional nucleotides. For example, a 
pr^er with the 3'-terminus -N-N-N-N-N would give 1024 
products. 



J. 



,, The polymerase chain reaction amplified 

fragments are then resolved by electrophoresis to display 
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bands representing the 3 '-ends of mRNAs present in the 
sample* 

Electrophoretic techniques for resolving PCR 
amplified fragments are well-understood in the art and 
need not be further recited here. The corresponding 
products are resolved in denaturing DNA sequencing gels 
and visualized by autoradiography . For the particular 
vector system described herein, the gels are run so that 
the first 140 base pairs run off their bottom, since 
vector-related sequences increase the length of the cDNAs 
by 140 base pairs. This number can vary if other vector 
systems are employed, and the appropriate electrophoresis 
conditions so that vector-related sequences run off the 
bottom of the gels can be determined from a consideration 
of the sequences of the vector involved. Typically, each 
reaction is run on a separate denaturing gel, so that at 
least two gels are used. It is preferred to perform a 
series of reactions in parallel, such as from different 
tissues, and resolve all of the reactions using the same 
primer on the same gel. A substantial number of 
reactions can be resolved on the same gel. Typically, as 
many as thirty reactions can be resolved on the same gel 
and compared. As discussed below, this provides a way of 
determining tissue-specific mRNAs. 

Typically, autoradiography is used to detect 
the resolved cDNA species. However, other detection 
methods, such as phosphorimaging or fluorescence, can 
also be used, and may provide higher sensitivity in 
certain applications. 

According to the scheme, the cDNA libraries 
produced from each of the mRNA samples contain copies of 
the extreme 3'-ends from the most distal site for Msp I to 
the beginning of the paly(A) tail of all poly (A) * mRNAs 
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in the starting RNA sample approximately according to the 
initial relative concentrations of the mRNAs. Because 
both ends of the inserts for each species are exactly 
defined by sequence, their lengths are uniform for each 
species allowing their later visualization as discrete 
bands on a gel, regardless of the tissue source of the 
mRNA. 

The use of successive steps with lengthening 
primers to survey the cDNAs essentially act like a nested 
PCR. These steps enhance quality control and diminish 
the background that potentially could result from 
amplification of untargeted cDNAs . In a preferred 
embodiment f the second reverse transcription step 
subdivides each cRNA sample into sixteen subpools, 
utilizing a primer that anneals to the sequences derived 
from pBC SK 4 * but extends across the CGG of the non- 
regenerated Mso l site and including two nucleotides (-N- 
N) of the insert. This step segregates the starting 
population of potentially 50,000 to 100,000 mRNAs into 
sixteen subpools of approximately 3,000 to 6,000 members 
each. In serial iterations of the subsequent PCR step, 
in which radioactive label is incorporated into the 
products for their autoradiographic visualization, those 
pools are further segregated by division into four or 
sixteen subsubpools by using progressively longer 5'- 
primers containing three or four nucleotides of the 
insert. 

By first demanding by high temperature 
annealing a high fidelity 3'-end match at the reverse 
transcription step in the -N-N positions, and 
subsequently demanding again such high fidelity matching 
into -N-N-N or -N-N-tf-N iterations, bleedthrough from 
mismatched priming at the -N-N positions is drastically 
minimized. * 
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The steps of the process beginning with 
dividing the cRNA preparation into sixteen subpools and 
transcribing first-strand cDNA from each subpool can be 
performed separately as a method of simultaneous 
sequence-specific identification of mRNAs corresponding 
to members of an antisense cRNA pool representing the 3'- 
ends of a population of mRNAs. 

II. APPLICATIONS OF THE METHOD FOR DISPLAY OF mRNA 
PATTERNS 

The method described above for the detection of 
patterns of mRNA expression in a tissue and the resolving 
of these patterns by gel electrophoresis has a number of 
applications. One of these applications is its use for 
the detection of a change in the pattern of mRNA 
expression in a tissue associated with a physiological or 
pathological change. In general, this method comprises: 

(1) obtaining a first sample of a tissue that 
is not subject to the physiological or pathological 
change; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing the 
method of simultaneous sequence-specific identification 
of mRNAs corresponding to members of an antisense cRNA 
pool representing the 3'-ends of a population of mRNAs as 
described above to generate a first display of bands 
representing the 3 '-ends of mRNAs present in the first 
sample; 

(3) obtaining a second sample of the tissue 
that has been subject to the physiological or 
pathological change; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing the 
method of simultaneous set^uence-specif ic identification 



of mRNAs corresponding to members of an antisense cRNA 
Pool representing the 3 -ends f a population of mRNAs as 
described above to generate a sec „d display f bands 
representing the 3 '-ends of mRNAs present in the second 
sample; and 

(5) comparing the first and second displays to 
determxne the effect of the physiological or pathological 
change on the pattern of mRNA expression in the tissue. 

Typically, the comparison is made in adjacent 
lanes of a single gel. 

The tissue can be derived from the central 
nervous system. m particular, it can be derived from a 
structure within the central nervous system that is the 
retina, cerebral cortex, olfactory bulb, thalamus, 
hypothalamus, anterior pituitary, posterior pituitary, 
hippocampus, nucleus accumbens, amygdala, striatum, 
cerebellum, brain stem, suprachiasmatic nucleus, or 
spmal cord, when the tissue is derived from the central 
nervous system, the physiological or pathological change 
can be any of Alzheimer's disease, parkinsonism, 
ischemia, alcohol addiction, drug addiction, 
schizophrenia, amyotrophic lateral sclerosis, multiple 
sclerosis, depression, and bipolar manic-depressive 
disorder. Alternatively, the method of the present 
invention can be used to study circadian variation 
agmg, or long-term potentiation, the latter affecting 
the hippocampus. Additionally, particularly with 
reference to mRNA species occurring in particular 
structures within the central nervous system, the method 

Involv 7 ^ ^ bra±n regi ° nS ~* to be 

involved in complex behaviors, such as learning and 

tZIZ' b T i0n ' addiCti ° n < hamate neurotoxicity, 



This method can also be used to study the 
results of th administration of drugs and/ or toxins to 
an individual by comparing the mRNA pattern of a tissue 
before and after the administration of the drug or toxin* 
Results of electroshock therapy can also be studied* 

Alternatively, the tissue can be from an organ 
or organ system that includes the cardiovascular system, 
the pulmonary system, the digestive system, the 
peripheral nervous system, the liver, the kidney, 
skeletal muscle, and the reproductive system, or from any 
other organ or organ system of the body. For example, 
mRNA patterns can be studied from liver, heart, kidney, 
or skeletal muscle. Additionally, for any tissue, 
samples can be taken at various times so as to discover a 
circadian effect of mRNA expression. Thus, this method 
can ascribe particular mRNA species to involvement in 
particular patterns of function or malfunction. 

The antisense cRNA pool representing the 3'- 
ends of mRNAs can be generated by sreps (l)-(4) of the 
method as described above in Section I. 

Similarly, the mRNA resolution method of the 
present invention can be used as part of a method of 
screening for a side effect of a drug. In general, such 
a method comprises: 

(1) obtaining a first sample of tissue from an 
organism treated with a compound of known physiological 
function; 

(2) determining the pattern of mRNA expression 
in the first sample of the tissue by performing the 
method of simultaneous sequence-specific identification 
of mRNAs corresponding to members of an antisense cRNA 
pool representing the 3'-ends of a population of mRNAs, 
as described above S to generate a first display of bands 
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representing the 3 '-ends of mRNAs present in the first 
sample; 

(3) obtaining a second sample of tissue from 
an organism treated with a drug to be screened for a side 

5 effect; 

(4) determining the pattern of mRNA expression 
in the second sample of the tissue by performing the 
method of simultaneous sequence-specific identification 
of mRNAs corresponding to members of an antisense cRNA 

10 pool representing the 3 '-ends of a population of mRNAs, 

as described above, to generate a second display of bands 
representing the 3 '-ends of mRNAs present in the second 
sample; and 

^: (5) comparing the first and second displays in 

15 order to detect the presence of mRNA species whose 

0] expression is not affected by the known compound but is 

y«j affected by the drug to be screened , thereby indicating a 

01 difference in action of the drug to be screened and the 

known compound and thus a side effect. 

s 

2 0 f R 1 

"sasr 

In particular, this method can be used for 
"jjj drugs affecting the central nervous system, such as 

antidepressants , neuroleptics , tranquilizers , 
anticonvulsants, monoamine oxidase inhibitors, and 
25 stimulants. However, this method can in fact be used for 

any drug that may affect mRNA expression in a particular 
tissue. For example, the effect on mRNA expression of 
anti-parkinsonism agents, skeletal muscle relaxants, 
analgesics, local anesthetics, cholinergics, 
30 antispasmodics, steroids, non-steroidal anti-inf lammatory 

drugs, antiviral agents, or any other drug capable of 
affecting mRNA expression can be studied, and the effect 
determined in a particular tissue or structure. 

35 A further application of the method of the 

present invention-* is in obtaining the sequence of the 3'- 
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ends of mRNA species that are displayed. In general, a 
method of obtaining the sequence comprises: 

(1) eluting at least fte cDNA corresponding to 
a mRNA from an electropherogram in which bands 
representing the 3'-ends of mRNAs present in the sample 
are displayed; 

(2) amplifying the eluted cDNA in a polymerase 
chain reaction; 

(3) cloning the amplified cDNA into a plasmid; 

(4) producing DNA corresponding to the cloned 
DNA from the plasmid; and 

(5) sequencing the cloned cDNA. 

The cDNA that has been excised can be amplified 
with the primers previously used in the PCR step. The 
cDNA can then be cloned into pCR II (Invitrogen, San 
Diego, CA) by TA cloning and ligation into the vector. 
Minipreps of the DNA can then be produced by standard 
techniques from subclones and a portion denatured and 
split into two aliquots for automated sequencing by the 
dideoxy chain termination method of Sanger. A 
commercially available sequencer can be used, such as a 
ABI sequencer, for automated sequencing. This will allow 
the determination of complementary sequences for most 
cDNAs studied, in the length range of 50-500 bp, across 
the entire length of the fragment. 

These partial sequences can then be used to 
scan genomic data bases such as GenBanJc to recognize 
sequence identities and similarities using programs such 
as BLASTN and BLASTX. Because this method generates 
sequences from only the 3 '-ends of mRNAs it is expected 
that open reading frames (ORFs) would be encountered only 
occasionally, as the 3 '-untranslated regions of brain 
mRNAs are on average longer than 1300 nucleotides (J.G. 
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Sutcliffe, saBCa) . Pot ntial ORFs can be examined for 
signature protein motifs • 

The cDNA sequences obtained can then be used to 
5 design primer pairs for semiquantitative PCR to confirm 

tissue expression patterns* Selected products can also 
be used to isolate full-length cDNA clones for further 
analysis. Primer pairs can be used for SSCP-PCR (single 
strand conformation polymorphism-PCR) amplification of 
10 genomic DNA. For example, such amplification can be 

carried out from a panel of interspecific backcross mice 
to determine linkage of each PCR product to markers 
already linked. This can result in the mapping of new 
*J( genes and can serve as a resource for identifying 

15 41 candidates for mapped mouse mutant loci and homologous 

human disease genes. SSCP-PCR uses synthetic 



W 
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yi oligonucleotide primers that amplify, via PCR, a small 
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(100-200 bp) segment. (M. Orita et al., "Detection of 
Polymorphisms of Human DNA by Gel Electrophoresis as 
20* p Single-Strand Conformation Polymorphisms, " Proc. Natl. 

Acad. Sci. USA 86: 2766-277Q (1989); M. Orita et al., 
"Rapid and Sensitive Detection of Point Mutations in DNA 
Q Polymorphisms Using the Polymerase Chain Reaction," 

Genomics 5: 874-879 (1989)). 

The excised fragments of cDNA can be 
radiolabeled by techniques well-known in the art for use 
in probing a northern blot or for in situ hybridization 
to verify mRNA distribution and to leam the size and 
3 0 prevalence of the corresponding full-length mRNA. The 

probe can also be used to screen a cDNA library to 
isolate clones for more reliable and complete sequence 
determination. The labeled probes can also be used for 
any other purpose, such as studying in vitro expression. 

35 / 
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Another aspect of th present invention is 
panels and degenerate mixtures of primers suitable for 
the practice of the present invention. These include: 

(1) a panel of primers comprising 16 primers of 
the sequence A-G-G-T-C-G-A-OG-G-T-A-T-C-G-G-N-N (SEQ ID 
NO: 3), wherein N is one of the four deoxyribonucleotides 
A, C, G, or T; 

(2) a panel of primers comprising 64 primers of 
the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N (SEQ 
ID NO; 5) , wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; 

(3) a panel of primers comprising 256 primers 
of the sequences A-G-G-T-C-G-A-C-G-G-T-A-T-C-G-G-N-N-N-N 
(SEQ ID NO: 6), wherein N is one of the four 
deoxyribonucleotides A, C, G, or T; and 

(4) a panel of primers comprising 12 primers 
of the sequences A-A-C-T-G-G-A-A-G-A-A-T-T-C-G-C-G-G-C-C- 
G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-V-N 
(SEQ ID NO: 2) , wherein V is a deoxyribonucleotide 
selected from the group consisting of A, C, and G; and N 
is a deoxyribonucleotide selected from the group 
consisting of A, C, G, and T; and 

(5) a degenerate mixture of primers comprising 
a mixture of 12 primers of the sequences A-A-C-T-G-G-A-A- 
G-A-A-T-T-C-G-C-G-G-C-C-G-C-A-G-G-A-A-T-T-T-T-T-T-T-T-T- 
T-T-T-T-T-T-T-T-T-V-N (SEQ ID NO: 2), wherein V is a 
deoxyribonucleotide selected from the group consisting of 
A, C, and G; and N is a deoxyribonucleotide selected from 
the group consisting of A, C, G, and T, each of the 12 
primers being present in about an equimolar quantity. 

The invention is illustrated by the following 
Example. The Example is for illustrative purposes only 
and is not: intended to limit the invention. 
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R3so;utjoii ot 9rqin Vsjpg Primers Coyre?poqding fro 

Sequences of Known Brain mRNAs of Different 
Concentrations 

To demonstrate the effectiveness of the method 
of the present invention, it was applied using 5'-primers 
extended at their 3 '-ends by two nucleotides and 
corresponding to the sequence of known brain mRNAs of 
different concentrations, such as neuron-specific enalase 
(NSE) at roughly 0*5% concentration (S. Forss-Petter et 
al., "Neuron-Specific Enolase: Complete Structure of Rat 
mRNA, Multiple Transcriptional Start Sites and Evidence 
for Translational Control," J- Neurosci. Res. 16: 141-156 
(1986)), RC3 at about 0.01%, and somatostatin at 0.001% 
(G.H. Travis & J.G. Sutcliffe, "Phenol Emulsion-Enhanced 
DNA-Driven Subtract ive cDNA Cloning: Isolation of Low- 
Abundance Monkey Cortex-Specific mRNAs," Proc. Natl. 
Acad. Sci. USA 85: 1696-1700 (1988)) to compare cDNAs 
made from libraries constructed from cerebral cortex, 
striatum, cerebellum and liver RNAs made as described 
above. On short autoradiographic exposures from any 
particular RNA sample, 50-100 bands were obtained. Bands 
were absolutely reproducible in duplicate samples. 
Approximately two-thirds of the bands differed between 
brain and liver samples, including the bands of the 
correct lengths corresponding to the known brain-specific 
mRNAs. This was confirmed by excision of the bands from 
the gels, amplification and sequencing. Only a few bands 
differed among samples for various brain regions for any 
particular primer, although some band intensities 
differed. 

4 
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The band corresponding to MSB, a relatively 
prevalent aRHA sp^-ies, appears to all of the brain 
samples but not in the liver sample,, ^ not 
observed when any of the last three single nucleotides 
within the four-base 3'-tarminal stance -„-„-„-„ was 
changed in the synthetic S'-prtoer. When the first N was 
changed, a small amount of bleedthrough is detected. For 
the known species, the intensity of the autoradiographic 
signal was roughly proportional to mBNA prevalence, and 
**NAs With concentrations of on. part in 10 < or greats of 
the poly (Ar rha were routinely vi3i]blei ^ ^ 

occasional problem that = D »As that migrated close to more 
intense bands were obscured. 



A sample of the data is shown in Figure 2. In 
«» 5 gel lanes on the left, cortex cRNA was substrate 
for reverse transcription with the primer A-G-G-T-c-c-A- 
«-M-J^. K -» (SEQ „ „ 0: 3) whare _ n _ h 

£17Z ""'I ^ (PrimSr 116) ° r ^imer 106, . 

IU f ' C-S-S-N-N-N-N (SEQ ID NO: S , where -K-N— N-H is -C-T- 

W ill, 128) ' "°" T " G_A (Primer 127 ' < " c -^-= (Frta« 

as indl!^"!" 0 (Prilner " 4) ' - C - G - G - C f P rij »« ««) • 
as indicated ln Figure z. 9ciam us and lu 

sequence of the two and four nucleotides, respectively, 
downstream from the J^i site located the nearest the - 

will Se3UenCe - 127 " etched 

U8 in I"" S6 ' UanCe ^ la " ™ 
130 in ^* neXt - t0 - Xast P^ition, primers lo S and 

130 in the -3 position, and primers lis and 134 in the -4 
position, primer 134 extended two nucleotides further 
upstream than the others shown here, hence its PC* 
products are two nucleotides longer relative to the 
products in other lanes. 
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In each lane, 50-100 bands were visible in 15- 
minute exposures using n P-dCTF to radiolabel the 
products. These bands were apparently distinct for each 
primer pair, with the exception that a subset of the US- 
UI bands appeared more faintly in the 116-134 lane, 
trailing by two nucleotides, indicating bleedthrough in 
the four position. 

The 118-111 primer set was used again on 
separate cortex (CX) and liver (LV) cRNAs . The cortex 
pattern was identical to that in lane 113-111, 
demonstrating reproducibility. The liver pattern 
differed from CX in the majority of species. The 
asterisk indicates the position of the NSE product. 
Analogous primer sets detected RC3 and somatostatin 
(somat) products (asterisks) in CX but not LV lanes. The 
relative band intensities of a given PGR product can be 
compared within lanes using the same primer set, but not 
different sets. 

This example demonstrates the feasibility and 
reproducibility of the method of the present invention 
and its ability to resolve different mJRNAs . It further 
demonstrates that prevalence of particular mRNA species 
can be estimated from the intensity of the 
autoradiographic signal. The assay allows mRNAs present 
in both high and low prevalence to be detected 
simultaneously . 

ADVANTAGES OF THE PRESENT INVENTION 

The present method can be used to identify 
genes whose expression is altered during neuronal 
development, in models of plasticity and regeneration, in 
response jfco chemical or electrophysiological challenges 
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such as neurotoxicity and long-term potentiation, and in 
response to behavioral, viral, drug/ alcohol paradigms, 
the occurrence of cell death or apoptosis, aging, 
pathological conditions, and other conditions affecting 
5 mRNA expression. Although the method is particularly 

useful for studying gene expression in the nervous 
system, it is not limited to the nervous system and can be 
used to study mRNA expression in any tissue. The method 
allows the visualization of nearly every mRNA expressed 
10 by a tissue as a distinct band on a gel whose intensity 

corresponds roughly to the concentration of the mRNA. 
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The method has the advantage that it does not 
depend on potentially irreproducible mismatched random 
IS 41 priming, so that it provides a high degree of accuracy 

and reproducibility. Moreover, it reduces the 
yi complications and imprecision generated by the presence 

Hi of concurrent bands of different length resulting from 

^ the same mRNA species as the result of different priming 

20 B events. In methods using random priming, such concurrent 

bands can occur and are more likely to occur for mRNA 
species of high prevalence. In the present method, such 

concurrent bands are avoided. 

fi 

3 The method provides sequence-specif ic 

information about the mRNA species and can be used to 
generate primers, probes, and other specific sequences. 

Although the present invention has been 
3 0 described in considerable detail, with reference to 

certain preferred versions thereof, other versions are 
possible. Therefore, the spirit and scope of the 
appended claims should not be limited to the description 
of the preferred versions contained herein. 
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